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Abstract 


Intelligent Tutoring Systems (ITSs) have begun to develop 
hybrid systems that balance the learning benefits of ITSs 
with the motivational benefits of games. iSTART-ME 
(Motivationally Enhanced) is a new game-based learning 
environment developed on top of an existing ITS for 
reading comprehension (iSTART). In an 11 session lab- 
based study, 40 high school students interacted with the full 
iSTART-ME system and completed comprehension 
measures at multiple time points (pretest, posttest, retention, 
and transfer). The current work examined students’ 
comprehension outcomes and how they were related to 
performance within three integrated practice methods: 
Coached Practice (non-game), Showdown (game-based), 
and Map Conquest (game-based). Results indicate that 
performance within the game-based practice environments 
was positively related to comprehension outcomes, whereas 
performance within the non-game environment had no 
relation to subsequent comprehension measures. 


Introduction 


Intelligent Tutoring Systems (ITSs) have been producing 
consistent learning gains for decades. However, a common 
problem with these systems is maintaining student 
engagement without reducing learning benefits. This is 
particularly problematic for long-term ITSs that require 
interactions spanning across days, weeks, or even months. 
Student interest within these tutoring systems often wanes 
over time due to the repetitive nature of practice tasks. 

One previously successful solution to improve 
engagement has been to incorporate game-like components 
into educational environments (for a review, see Clark, 
Nelson, Sengupta, & D’Angelo, 2009). Several systems 
have taken this route to create combinations of Intelligent 
Tutoring and Games (McNamara, Jackson, & Graesser, 
2010). One example of this endeavor is the Interactive 
Strategy Training for Active Reading and Thinking- 
Motivationally Enhanced ((START-ME) system which 
was built on top of an existing ITS (called iSTART) and 
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adapted into a game-based environment where students can 
practice strategies, earn points, advance through levels, 
purchase rewards, create a personalized avatar, and play 
educational mini-games. The current work investigates 
students’ interactions with game and non-game based 
versions of strategy practice within iSTART-ME. 


iSTART-ME 


iSTART-ME is a game-based ITS designed to improve 
students’ reading comprehension by teaching  self- 
explanation in combination with effective reading 
strategies. iSTART-ME enhances the original ITS version 
of iSTART by adding in game-based features to maintain 
students’ enjoyment and motivation over an extended 
period of training (Jackson & McNamara, 2011). All 
versions of iSTART are founded on research with a 
successful human intervention called SERT (Self- 
Explanation Reading Training: McNamara, 2004; 
O’Reilly, Taylor, & McNamara, 2006). Students who have 
been through iSTART have _ shown - significant 
improvement in reading comprehension, comparable to the 
performance within SERT (Magliano et al., 2005). 

iSTART training is separated into three distinct modules 
that instantiate the pedagogical principle of modeling- 
scaffolding-fading: introduction, demonstration, and 
practice, respectively. The Introduction module contains 
three animated agents that engage in a vicarious dialogue 
to introduce the concept of self-explanation and each of the 
reading strategies. The Demonstration module scaffolds 
the learner from interactive instruction to an applied 
setting. This module includes two animated agents who 
generate and discuss the quality of example self- 
explanations and prompt the learner to identify which 
strategies may have been used. The Practice module fades 
out more of the direct instruction and requires learners to 
generate their own self-explanations while an animated 
agent (Merlin) provides qualitative feedback on how to 
improve the self-explanation quality. 

This feedback during practice is based on a natural 
language assessment algorithm that evaluates each self- 
explanation produced by the students (McNamara, 
Boonthum et al., 2007). The algorithm output is coded as a 
0, 1, 2, or 3. An assessment of “0” relates to self- 


explanations that are either too short or contain mostly 
irrelevant information. An iSTART score of “1” is 
associated with a self-explanation that primarily relates 
only to the target sentence itself (sentence-based). A “2” 
means that the student’s self-explanation incorporated 
some aspect of the text beyond the target sentence (text- 
based). If a self-explanation earns a “3”, then it is 
interpreted to have incorporated information at a global 
level, and may include outside information or refer to an 
overall theme across the whole text (global-based). This 
algorithm has demonstrated performance comparable to 
that of humans, and provides a general indication of the 
cognitive processing required to generate a_ self- 
explanation (Jackson, Guess, & McNamara, 2010). 

Within iSTART there are two types of practice modules, 
both of which use the same natural language assessment 
algorithm. The first practice module is situated within the 
core context of iSTART (initial 2-hour training) and 
includes two texts. The second practice module is a form 
of extended interaction and operates in the exact same 
manner as the original practice module. During extended 
practice, a teacher can assign specific texts for students to 
read. These texts are either already in the system or can be 
added to the system on short notice. Because of the need to 
incorporate various texts, the iSTART feedback algorithm 
has been designed to adapt to new texts (McNamara, 
Boonthum et al.,, 2007), with performance comparable to 
humans (Jackson, Guess et al, 2010). 

The extended practice module is designed to provide a 
long-term learning environment that can span weeks or 
months. Research on iSTART has shown that the extended 
practice effectively increases students’ performance over 
time (Jackson, Boonthum, & McNamara, 2010). Students 
have consistently demonstrated significant improvement in 
reading comprehension, with average effect sizes ranging 
from .68 to 1.12 depending on prior knowledge of the 
learner (McNamara, O’Reilly et al., 2007). However, skill 
mastery requires long-term interaction with repeated 
practice (Jackson, Boonthum et al., 2010). One unfortunate 
side effect of this long-term interaction is that students 
often become disengaged and uninterested in using the 
system (Bell & McNamara, 2007). Thus, iSTART-ME has 
been developed on top of the existing ITS and incorporates 
serious games and other game-based elements (Jackson, 
Boonthum, & McNamara, 2009; Jackson, Dempsey, & 
McNamara, 2010). 


Game-based additions 


To combat the problem of disengagement over time, the 
iSTART extended practice module has been situated 
within a game-based environment called iSTART-ME 
(motivationally enhanced). This game-based environment 
builds upon the existing iSTART system and was 
specifically designed to increase persistence and active 
engagement through extended practice. The iSTART-ME 
system and design rationale has been extensively described 


498 


in other papers, so only the relevant aspects will be 
described here (Jackson, Boonthum, & McNamara, 2009). 

The main focus of the iSTART-ME project is to 
implement and assess game-based principles and features 
that are expected to support effective learning, increase 
motivation, and sustain engagement throughout a long- 
term interaction. The previous version of iSTART 
automatically progressed students from one text to another 
with no intervening actions. The new version of iSTART- 
ME is controlled through a selection that provides students 
opportunities to interact with new texts, earn points, 
advance through levels, purchase rewards, personalize a 
character, and play educational mini-games (using the 
training strategies). 

In addition to other enhancements, iSTART-ME allows 
students to practice self-explaining within three different 
environments: Coached Practice, Showdown, and Map 
Conquest. These versions of practice situate the same 
task of self-explanation within different contexts. 

Coached Practice (Figure 1) is a revised version of the 
original practice module within iSTART. Learners are 
asked to generate their own self-explanation when 
presented with a text and specified target sentence. 
Students are guided through practice by Merlin, a wizard 
who provides qualitative feedback for user-generated self- 
explanations. Merlin reads sentences aloud to the 
participant, stops after reading a target sentence, and asks 
the participant to self-explain the bolded sentence. After a 
self-explanation is submitted, Merlin provides feedback on 
the quality of the self-explanation using the automatic 
assessment algorithm. If the contribution quality is low, 
students can try again and use Merlin’s feedback to 
improve their current self-explanation. The only game-like 
elements within Coached Practice are a colored qualitative 
feedback bar (visually indicating: poor, fair, good, great) 
and points associated with each self-explanation. 


Practice - Microsoft Internet Explorer 


(Seat 2-1 doa't know what cumulonimbus mean - (10 pts} 
[Sent 2:1 don't know what cumocimbus mean. but [think this is a 


Stages of Thunderstorm Development 
All thunderstorms have a similar life history. Thunderstorms 
start with the development of large cumulonimbus clouds. 
This process is facilitated by high surface 
temperatures. 
e cumolunimbus clouds are cause by heat from 
the earth surface. And[ PTT TSuriiicaia Mud 


more clouds 


‘Final points on last self-explanation 50) 
‘Your total START points forthis text = 70. 


STRATEGY REVIEW 


GoTo Selection Meng 


Figure1. Screenshot of Coached Practice. 


Showdown is a game-based method of practice that 
requires students to generate a self-explanation for a 
specified target sentence (see Figure 2). Participants 
compete against a computer player to win rounds by 
writing better self-explanations. Participants are guided 
through the game by text-based instructions. After the 
participant completes each self-explanation, the computer 
scores the self-explanation on a scale of 0-3 and displays 
the score as stars (using same algorithm as Coached 
Practice). The opponent’s  self-explanation is also 
presented and scored. The self-explanations for the virtual 
player are randomly drawn from a database of existing, 
pre-evaluated self-explanations. The  self-explanation 
scores are compared and the player with the higher score 
wins the round. For tie scores, the player is given another 
sentence worth two points instead of one. The player with 
the most points at the end of a text is declared the winner. 


ie a> 9 
F q 


a____5 
DOA. 


Player 1 wins this point. 


Figure 2. Screenshot of Showdown. 


Map Conquest is the second game-based method of 
practice in which students generate their own self- 
explanations (Figure 3). In this game, the quality of a 
student’s self-explanation determines the number of dice 
that a student earns. The score of 0 to 3 is created utilizing 
the same natural language assessment algorithm as both 
Coached Practice and Showdown (0-3 dice are awarded 
per response). After a round of generating two self- 
explanations, students are transitioned into a series of map 
actions. When the students interact with the map, they 
attempt to occupy the entire board by allocating resources 
(1.e., troops) to their spaces, conquering neighboring 
territories, and defending their own spaces from CPU 
attacks. All of these actions (allocating, conquering, 
defending) occur within a single round of map interactions 
before the player is transitioned back to generate self- 
explanations and earn more dice. In this way, the game 
actions and practice are separate rather than integrated as in 
SE Showdown. This cyclic process between the game and 
practice continues until the student completes a self- 
explanation for each of the target sentences within a text. 

Though the surface features of these practice methods 
differ, they all require students to perform the same 
learning task and method of assessment. During training 
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with iSTART-ME, students are allowed to freely select 
between these forms of practice. 
| Map Conquest 


Figure 3. Screenshot of Map Conquest. 


Current Study 


Previous research with the iSTART-ME system yielded 
positive, yet complex patterns of outcomes that examined 
multiple time-scales of intervention. The research indicated 
that after a short-term interaction (~60 minutes, including 
brief training), students who used a game-based method of 
practice (Showdown) performed worse than students using 
a non-game-based environment (Coached Practice; 
Jackson, Dempsey, & McNamara, 2012). However, in a 
longer-term pilot evaluation with full training (~6 hours 
across multiple sessions), students performed equally well 
using either the game or non-game-based practice 
environment. Therefore, one possible concern with 
integrating games into learning systems is that they have 
the potential to detract from the immediate pedagogical 
goals and reduce learning improvements in the short-term. 
However, across long-term training, the engagement 
fostered by the game environment may compensate for any 
distracting elements, thus allowing students to catch up in 
performance (Jackson, et al.,2012). 


The current work was conducted to expand upon these 
findings and to more thoroughly explore the potential long- 
term benefits of game-based training environments. 


Procedure 


Participants in this study included 40 high school students 
from a mid-south urban environment. These students are a 
subset of 126 students who originally participated as part 
of a larger study that compared three conditions: iSTART- 
ME, iSTART-original, and no-tutoring control. The current 
work focuses only on those students who were assigned to 
the iSTART-ME condition. 


All students participated in an 11-session experiment 
involving four phases: pretest, training, posttest, and 
retention. During the first session, students completed a 
pretest that included measures to assess their prior reading 
comprehension ability. During the 8 training sessions 
(completed within a 10 day span), participants completed 
all of the iSTART training (Introduction, Demonstration, 


and initial Practice) and then transitioned into the iSTART- 
ME selection menu for the remainder of the training 
sessions. Students completed the posttest at session 10, 
which included measures similar to the pretest. Five days 
later, at the final session, students completed two 
comprehension measures, the retention and transfer tests. 


Measures 


All students completed the same number of testing and 
training sessions (i.e., pretest, training, posttest, and 
retention). Each of the three testing sessions included 
measures of student reading comprehension, and both the 
pretest and posttest had additional measures pertaining to 
students’ attitudes, motivation, and enjoyment. 


Students’ reading comprehension was assessed using 
passage-specific comprehension measures. At pretest, 
posttest, and retention students’ were asked to read and 
self-explain a science passage of approximately 300 words 
covering one of the following topics: red blood cells, 
cellular growth & repair, or earthquakes/plate tectonics. 
These passages have been used during previous research 
with iSTART and were selected due to their similarity on 
linguistic difficulty measures via Coh-Metrix (Graesser et 
al., 2004). After reading and self-explaining the passage, 
students were presented with a series of open-ended 
questions (the text was not available after the questions 
appeared). Two types of questions were included and 
correct responses required information from either the 
textbase or situation model levels (e.g., McNamara et al., 
1996; McNamara et al., 2006). The textbase questions 
involved information that could be found within a single 
sentence, whereas the situation model questions required 
students to generate an inference from information 
contained in at least two separate sentences. These two 
types of questions were designed to detect the level of 
comprehension most affected by training. In addition to 
these three science passages, students also completed a 
transfer comprehension assessment during the delayed 
retention test. This transfer task included a longer text of 
600 words (on plant growth and development) and students 
were not prompted to self-explain while reading. 


Results 


The following analyses were conducted to examine 
students’ choices to select particular environments during 
training, and how performance within those environments 
relates to overall comprehension. Thus, analyses examined 
the frequency with which each practice method was 
selected and how performance within each environment 
related to comprehension outcomes. 


For each practice text, students were free to choose 
between the three different practice environments. 
Analyses investigating the frequency of environment 
selection found no differences between Coached Practice 
(M=7.45, SD=4.67), Showdown (M=5.45, SD=3.79), or 
Map Conquest (M=8.31, SD=5.66), through parametric, 
F(2,56)=2.41, p>.05, and non-parametric tests, 7(2)=3.88, 
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p>.05. Additionally, there were no environment selection 
frequency differences between students of high or low 
pretest comprehension abilities, F(2,54)=0.42, p>.05. 
Therefore, students were equally likely to interact with any 
of the three practice methods during training. 


Once students selected a practice environment their self- 
explanation performance was assessed using the same 
iSTART algorithm. Pearson correlations indicate that 
students’ pretest comprehension has a significant positive 
relation to their performance within all three practice 
environments (see Pretest column in Table 1). 
Interestingly, performance within the two game-based 
methods of practice (Showdown and Map Conquest) is 
significantly positively related to comprehension measures 
at posttest, retention, and transfer (all p<.01). However, 
performance within the non-game method of practice 
(Coached Practice) was not related to comprehension 
scores at posttest, retention, or transfer (all p >.05). 


Table 1. Correlations for overall comprehension outcomes. 


Pretest Posttest Retention Transfer 
CP Performance r .332* .276 212 237 
(p) (.039) (.088) (.194) (.147) 
SD Performance r .654** 516% 481* .602** 
(p) (.000) (.002) (.005) (.000) 
MC Performance r .535**  .563** A96** .532** 
(p) (.001) (.000) (.002) (.001) 


CP = Coached Practice, SD = Showdown, MC = Map Conquest 


We were further interested in how each practice 
environment may affect comprehension at different levels 
(i.e., textbase vs. inference questions). Correlation analyses 
revealed findings similar to the overall comprehension 
results (see Table 2). Performance within Map Conquest 
was positively related to all levels of students’ 
comprehension at pretest, posttest, retention, and transfer. 
Performance within Showdown was related to students’ 
bridging questions at retention, and all of transfer. Finally, 
performance within Coached Practice was related to 
students’ pretest textbase questions, but was not related to 
any other measure of comprehension. 


To explore these different effects further, median splits 
were created based on the performance within each 
practice environment (i.e., three separate median splits 
based on average performance). These median splits were 
used as a between-subjects factor to examine if high 
performing students within an environment maintained that 
ability on subsequent comprehension outcomes. A series of 
nine ANOVAs were conducted (which requires a 
Bonferroni corrected alpha of .0056 to reach significance) 
and found that students of high and low ability within 
Coached Practice did not have significantly different 
scores at posttest (p>.05), retention (p>.01), or transfer 
(p>.05). These tests also found that high and low 
performing students within Showdown did not differ at 
retention (p>.006), but high performing students scored 
higher on the posttest (p<.002) and transfer task (p<.003) 
than low performing students. Additionally, the final three 
ANOVAs found that high performing students within Map 


Table 2. Correlations for types of comprehension questions. 


Pretest Posttest Retention Transfer 
Text Inference Text Inference Text Inference Text Inference 

CP Performance r_ .344* 257 210 278 237 .168 .208 242 
(p) (.032)  (.120) (.198) (.087) (.146) (.308) (.205) (.138) 
SD Performance rr  .670** .540** 5S1ll* 426 342 544 ** 547** — 588** 
(p) (.000)  (.001) (.002) (.014) (.051) (.001) (.001) (.000) 
MC Performance r_ .471**  .571** 514**  .502* 374* 542 ** A95** 5 16** 
(p) (.003)  (.000) (.001) (.002) (.023) (.001) (.002) (.001) 


CP = Coached Practice, SD = Showdown, MC = Map Conquest 


Conquest scored significantly higher than low performing 
students at posttest (p<.001), retention (p<.003), and 
transfer (p<.001). These results demonstrate that students’ 
performance within the two game based environments 
persisted across later tests of comprehension, whereas 
students’ performance within coached practice did not 
relate to comprehension outcomes. Based on_ these 
findings, the game-based environments seem to be 
engaging the students differently than the non-game 
environment and may provide a more accurate assessment 
of overall comprehension abilities. 


Conclusions 


The overarching goal of the iSTART-ME project has been 
to further our understanding of the benefits of adding 
game-based elements to ITSs. The current study focused 
on examining students’ interactions within multiple 
practice environments, and how performance within those 
environments relates to potential improvements in primary 
learning outcomes (i.e., comprehension). The outcomes 
from this study provide an interesting within-subjects 
comparison between game and non-game based learning, 
using equivalent metrics of assessment across 
environments. 


Results indicate that students’ performance within the 
two game-based methods of practice was related to 
multiple comprehension outcomes. Thus, students who 
implemented the strategies successfully within the games 
also tended to apply those strategies well during 
subsequent comprehension tasks (even after a delay). In 
contrast, students’ performance within the non-game based 
practice was not indicative of subsequent comprehension 
outcomes. This means that students did not apply the 
strategies in the same manner within this environment as 
they did during comprehension evaluations after training. 


One potential explanation for this finding is due to the 
scaffolded nature of the non-game practice environment. 
During the non-game-based method of practice (Coached 
Practice), students are provided with formative feedback 
and are scaffolded through the interaction. In contrast, the 
game-based methods of practice (Showdown and Map 
Conquest) incorporate more implicit forms of feedback 
using student examples and game-based features. 
Therefore, it is likely that students learn strategies within 
Coached Practice (e.g., see Jackson, Boonthum et al.,, 
2010), but the additional coaching from the animated agent 
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does not reflect students’ true ability to apply the 
strategies. Rather, these results demonstrate that the 
unscaffolded interactions during gameplay are a more 
accurate measure of subsequent performance and 
indicative of transfer on comprehension tests. This 
assessment holds true at multiple levels of analyses: overall 
comprehension, textbase comprehension, and inference- 
based comprehension. Thus, performance within these 
games can provide feedback to students on their overall 
abilities and the results support the current practice 
environment design as valid assessment of students’ 
comprehension abilities. 


The current long-term evaluation goes beyond 
immediate short-term findings to explore the effects of 
games during prolonged interactions and skill acquisition 
(i.e., comprehension skills). This finding supports long- 
term learning trends from previous work (Jackson et al.,, 
2012), and creates a promising foundation from which we 
can extend subsequent work and further contribute to the 
scientific research on game-based learning. 


The development of iSTART-ME has allowed us to 
examine the effectiveness of a combined ITS and 
educational game system, as well as to more systematically 
evaluate the effects of game components in the context of 
an ITS. The system design provides a unique opportunity 
to simultaneously examine game and _ non-game 
environments that have the same underlying assessment 
metrics and pedagogical goals. Future work with iSTART- 
ME will include both global and local assessments of 
game-based performance, and further analyses of user 
performance, enjoyment, attitudes, engagement, and 
persistence across time different time scales. Additionally, 
several small-scale experiments are being implemented to 
examine the interactions between specific game 
components (e.g., teasing apart differences between 
Showdown and Map Conquest). 


Both the current and future work of iSTART-ME helps 
to further the field of Intelligent Tutoring Systems and 
game-based learning. The design of iSTART-ME is 
modular and thus provides an interesting theoretical 
alternative to the growing number of fully immersive 
epistemic games. Ultimately, we expect hybrid ITS and 
game-based learning environments to dramatically impact 
the effectiveness of computer-based training and further 
our understanding of the complex motivational aspects of 
learning environments and their interplay with learning. 
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